-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize performances of str_pad()
#19272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ext/standard/string.c
Outdated
@@ -5737,6 +5737,27 @@ PHP_FUNCTION(substr_count) | |||
} | |||
/* }}} */ | |||
|
|||
static inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) { | |
static zend_always_inline void php_str_pad_fill(char *dest, size_t pad_chars, const char *pad_str, size_t pad_str_len) { |
Using zend_always_inline
increases the likelihood that the function will actually be inlined.
Also, instead of passing char *dest
and size_t pad_chars
as separate arguments, how about accepting a zend_string
and extracting the values within this inline function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! PR updated with both suggestions
fbfa654
to
24d2823
Compare
24d2823
to
ba1eb4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this makes the API of php_str_pad_fill()
a little safer to use, because it makes less assumptions about the target pointer.
With regard to the inlining: I generally trust the compiler to make better than decisions than I can do myself. I would make the function just static void
without any inlining hints and let the compiler decide.
for (i = 0; i < left_pad; i++) | ||
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len]; | ||
if (left_pad > 0) { | ||
php_str_pad_fill(result, left_pad, pad_str, pad_str_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
php_str_pad_fill(result, left_pad, pad_str, pad_str_len); | |
php_str_pad_fill(ZSTR_VAL(result) + ZSTR_LEN(result), left_pad, pad_str, pad_str_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SakiTakamachi advised otherwise if I get it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears so, but the suggestion doesn't really make sense, since pad_chars
cannot be piggy-backed on the zend_string. Perhaps Saki meant to pass pad_str
and pad_str_len
as a zend_string*
(which would make sense to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes the code a bit slower, going from ~48ms per run to ~58ms. Do we still want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s fine to go with the faster one! :)
Please also see #19220 (comment) and #19220 (comment) for benchmarking advice. |
Thanks for the link, I'm having a look to hyperfine for this PR and #19276 |
ba1eb4c
to
78dd005
Compare
PR description updated to use hyperfine |
The |
I think that my system-wide install right now is PHP 8.5 nightly. Just to be sure, I rerun against master and the branch:
|
AWS x86_64 (c7i.24xl)
Laravel 11.1.2 demo app - 30 consecutive runs, 100 requests (sec)
Symfony 2.6.0 demo app - 30 consecutive runs, 100 requests (sec)
Wordpress 6.2 main page - 30 consecutive runs, 20 requests (sec)
bench.php - 25 consecutive runs (sec)
micro_bench.php - 25 consecutive runs (sec)
|
The above comment was due to test the real time benchmark :) Unfortunately, I guess most results should show zero difference, which is not currently the case :( The wordpress results are especially weird. I'll have to dig into it... cc. @iluuu1994 @arnaud-lb |
If I can be of any help, don't hesitate to tell me! It is indeed surprising |
AWS x86_64 (c7i.24xl)
Laravel 11.1.2 demo app - 30 consecutive runs, 100 requests (sec)
Symfony 2.6.0 demo app - 30 consecutive runs, 100 requests (sec)
Wordpress 6.2 main page - 30 consecutive runs, 20 requests (sec)
bench.php - 25 consecutive runs (sec)
micro_bench.php - 25 consecutive runs (sec)
|
@kocsismate FYI, I've written a small benchmarking helper for myself. https://github.com/iluuu1994/php-benchmark-diff It differs slightly from things like hyperfine in a few ways:
Not sure if this is useful to you, but wanted to mention it in case it is. It doesn't support JSON output yet. |
@iluuu1994 Nice thank you very much for the input! I've seen Tim's comment somewhere about hyperfine (and read a blog post about it by Volker), but didn't know how it exactly differs from what I do, but based on your description it looks very promising! Specifically, I'm pretty much interested about the p-value. I think my benchmark also discards some of the results, but I should look into it how I exactly implemented it. Alternating tests is also a very clever idea. I'll try to implement it at some point, because I'm still pretty much disappointed by my results :( Most tests used to be stable a few months ago, but they stopped being reliable a while ago. And curiously, microbenchmarks perform the worst. |
I see I kept the branch with the JSON output too. iluuu1994/php-benchmark-diff@master...json I planned to replace the Valgrind benchmark, but unfortunately collecting CPU cycles with |
I would like to propose this optimization for
str_pad()
. Here is the benchmark code:And the results with right padding:
Left padding results:
The idea is to avoid modulo operation in loops and copying char by char. Instead, this PR prefers the bulk copy approach.